1 Introduction

In this tutorial, we will learn how to set up a R Markdown document for reproducible statistical analysis/modeling, primarily using functions and syntax from the KnitR package. Please ensure you have the Chapter 9 folder downloaded, as the way this .rmd runs depends on the current file structure.

1.1 Aim

In this tutorial, you will learn methods to ensure your analysis is reproducible from the moment you start coding to the moment it is published. You will learn how to properly set up pipelines to rerun analysis and present results every time you knit, different methods to dynamically connect data gathering and source code to your output documents, and how to keep markdown documents up to date without overloading your computer when your analysis is computationally intensive.

1.2 Why is this important?

Coding is hard. It takes a long time, and you’ll likely have to go back to the same analysis multiple times to tweak and change things. It is important to setup a good workflow early to ensure you’re not scrambling to make your research reproducible at the end of the process. You’ll be saving yourself a lot of time if your analysis is reproducible and your results presented the way you want them to be every time you knit your document.

2 Learning Outcomes

Through this tutorial, students will learn to:

  1. Analyze data in a code chunk reproducibly.
  2. Dynamically report variables in text.
  3. Source or code analysis through multiple methods.
  4. Ensure results are reproducibly random.
  5. Run computationally intensive analysis efficiently.

2.1 Schedule

  • Tuesday: we will ensure that everyone has downloaded the right data and packages, go over code chunk setup, and go over different ways to source/code analysis (LOs 1-3).
  • Thursday: we will learn how to ensure analysis is reproducibly random and how to efficiently run computationally-intensive analysis (LOs 4-5).

3 Packages

Make sure you have the following packages:“knitr”, “rmarkdown”, “bookdown”, “formattable”, “kableExtra”, “dplyr”, “magrittr”, “prettydoc”, “htmltools”, “knitcitations”, “bibtex”, “devtools”, “tidyverse”, “ebirdst”, “sf”, “raster”, “terra”, “patchwork”, “tigris”, “rnaturalearth”, “ggplot2”,” rnaturalearthhires”. Below are the code chunks for loading packages and setting up the document from Ch. 2:

###~~~
# Load R packages
###~~~
#Create a vector w/ the required R packages
# --> If you have a new dependency, don't forget to add it in this vector
pkg <- c("knitr", "rmarkdown", "bookdown", "formattable", "kableExtra", "dplyr", "magrittr", "prettydoc", "htmltools", "knitcitations", "bibtex", "devtools", "tidyverse", "ebirdst", "sf", "raster", "terra", "patchwork", "tigris", "rnaturalearth", "ggplot2", "rnaturalearthhires")

##~~~
#2. Check if pkg are already installed on your computer
##~~~
print("Check if packages are installed")
#This line below outputs a list of packages that are not installed. Which ones of the packages above are installed in computer, print packages that are not installed. Should print 0.
new.pkg <- pkg[!(pkg %in% installed.packages())]

##~~~
#3. Install missing packages
##~~~
# Use an if/else statement to check whether packages have to be installed
# WARNING: If your target R package is not deposited on CRAN then you need to adjust code/function
if(length(new.pkg) > 0){
  print(paste("Install missing package(s):", new.pkg, sep=' '))
  install.packages(new.pkg, dependencies = TRUE)
}else{
  print("All packages are already installed!")
}

##~~~
#4. Load all required packages
##~~~
print("Load packages and return status")
#Here we use the sapply() function to require all the packages
# To know more about the function type ?sapply() in R console
# Just an easier way to load all packages
sapply(pkg, require, character.only = TRUE)
# debug = right-pointing arrow next to code will load in console so you can go line by line and debug. pressing this also seemed to resolve the CRAN needs a mirror error
# Generate BibTex citation file for all loaded R packages
# used to produce report Notice the syntax used here to
# call the function
knitr::write_bib(.packages(), file = "packages.bib")
# The .packages() function returns invisibly the names of all packages loaded in the current R session (to see the output, use .packages(all.available = TRUE)). This ensures that all packages used in your code will have their citation entries written to the .bib file. This packages is then used in the Appendix to cite the code used

Setup Chunk Options:

### Chunk options: see http://yihui.name/knitr/options/ ###
### Text output
opts_chunk$set(echo = TRUE, warning = TRUE, message = TRUE, include = TRUE)
## Code formatting
opts_chunk$set(tidy = TRUE, tidy.opts = list(blank = FALSE, width.cutoff = 60),
    highlight = TRUE)
## Code caching
opts_chunk$set(cache = 2, cache.path = "cache/")
## Plot output The first dev is the master for the output
## document
opts_chunk$set(fig.path = "Figures_MS/", dev = c("png", "pdf"),
    dpi = 300)
## Figure positioning
opts_chunk$set(fig.pos = "H")

4 Data

The data for this tutorial should be in the Ch. 9 folder you downloaded. For this chapter, we will be working with a couple of datasets. Mike is working on consolidating some different exploratory analyses he did at different times over the last year. He wants to get them in the same place and make sure that they are reproducible. One set of data comes from the supplementary material of a paper (Schuetz and Johnston, 2019). It is a csv file that was pulled from one of the authors of the paper’s github. The other dataset came from the r package ebirdst, which interfaces with data from the community science platform eBird. We’ll start with a dataset of “encounter rates” calculated from eBird data, which are calculated as the percentage of checklists that include a particular species in a given area. This data has been analyzed for the United States by state, and Mike is getting ready to calculate encounter rates for species in Sonora, Mexico that are also found in Idaho. Before digging into the eBird data for Sonora, Mike wanted to get to know the dataset for Idaho so that he knows what species to look for in the Sonora data. Let’s get that data ready.

4.1 Load Data

First, please set your working directory to the Chapter 9 folder that you loaded on your computer. You can do this by selecting the working directory through the session tab menu, then choosing to source file location, or using the command setwd(*your file path here*). Now we are going to to do a little bit more data transformation to make a couple of objects of the Idaho encounter rate data grouped taxonomically for some descriptive statistics we will use later.

library("tidyverse")
# load csv file of data
state_enc_rate <- read.csv("01_raw_data/state.level.enc.rate.csv")
summary(state_enc_rate)
##  common.name        scientific.name       state           query.volume.state
##  Length:31722       Length:31722       Length:31722       Min.   :  0.00    
##  Class :character   Class :character   Class :character   1st Qu.:  0.00    
##  Mode  :character   Mode  :character   Mode  :character   Median :  0.00    
##                                                           Mean   : 10.35    
##                                                           3rd Qu.:  8.00    
##                                                           Max.   :100.00    
##  encounter.state.normalized
##  Min.   :  0.0000          
##  1st Qu.:  0.0000          
##  Median :  0.3636          
##  Mean   : 15.0947          
##  3rd Qu.: 21.4322          
##  Max.   :100.0000
head(state_enc_rate)
##      common.name scientific.name      state query.volume.state
## 1 Abert's towhee Melozone aberti    Alabama                  0
## 2 Abert's towhee Melozone aberti     Alaska                  0
## 3 Abert's towhee Melozone aberti    Arizona                100
## 4 Abert's towhee Melozone aberti   Arkansas                  0
## 5 Abert's towhee Melozone aberti California                  0
## 6 Abert's towhee Melozone aberti   Colorado                  0
##   encounter.state.normalized
## 1                   0.000000
## 2                   0.000000
## 3                 100.000000
## 4                   0.000000
## 5                   2.357214
## 6                   0.000000

There is more data in this dataframe than we need, so lets do some transformation to get the data where we want it. Specifically, we are going to drop a column we don’t need, filter the data down to just Idaho, and filter for species with encounter rates higher than 50%.

# Get rid of uneccessary columns
state_enc_rate_clean <- state_enc_rate %>%
    dplyr::select(-query.volume.state  #not needed for analysis
)
view(state_enc_rate_clean)
# filter to just Idaho
state_enc_rate_clean_filtered <- state_enc_rate_clean %>%
    filter(state == "Idaho", )
view(state_enc_rate_clean_filtered)
# get rid of species with no encounter rate
ID_enc_rate <- state_enc_rate_clean_filtered %>%
    filter(encounter.state.normalized != 0)
# get ride of encounter rates below 50 to focus on commonly
# encou
ID_enc_rate <- ID_enc_rate %>%
    filter(encounter.state.normalized > 50)
# check size of final dataframe
summary(ID_enc_rate)
##  common.name        scientific.name       state          
##  Length:63          Length:63          Length:63         
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##  encounter.state.normalized
##  Min.   : 50.01            
##  1st Qu.: 58.34            
##  Median : 69.38            
##  Mean   : 71.40            
##  3rd Qu.: 83.32            
##  Max.   :100.00
# make a table for markdown output
kable(ID_enc_rate)
common.name scientific.name state encounter.state.normalized
American coot Fulica americana Idaho 55.75646
American kestrel Falco sparverius Idaho 100.00000
American robin Turdus migratorius Idaho 72.07650
American wigeon Mareca americana Idaho 60.66470
bank swallow Riparia riparia Idaho 63.75313
belted kingfisher Megaceryle alcyon Idaho 50.01036
black-billed magpie Pica hudsonia Idaho 83.85025
black-headed grosbeak Pheucticus melanocephalus Idaho 57.82899
Brewer’s blackbird Euphagus cyanocephalus Idaho 66.23826
Bullock’s oriole Icterus bullockii Idaho 72.90967
California quail Callipepla californica Idaho 100.00000
calliope hummingbird Selasphorus calliope Idaho 92.01797
Canada goose Branta canadensis Idaho 69.74989
Cassin’s finch Haemorhous cassinii Idaho 78.77421
Cassin’s vireo Vireo cassinii Idaho 100.00000
cedar waxwing Bombycilla cedrorum Idaho 64.66979
chukar Alectoris chukar Idaho 52.27404
cinnamon teal Spatula cyanoptera Idaho 50.02885
cliff swallow Petrochelidon pyrrhonota Idaho 62.71355
common goldeneye Bucephala clangula Idaho 73.44130
common merganser Mergus merganser Idaho 77.87708
common raven Corvus corax Idaho 56.20139
dark-eyed junco Junco hyemalis Idaho 70.30395
dusky flycatcher Empidonax oberholseri Idaho 88.66510
Eurasian collared-dove Streptopelia decaocto Idaho 68.98302
European starling Sturnus vulgaris Idaho 60.81733
evening grosbeak Coccothraustes vespertinus Idaho 53.82014
great horned owl Bubo virginianus Idaho 99.39787
Hammond’s flycatcher Empidonax hammondii Idaho 100.00000
house finch Haemorhous mexicanus Idaho 60.00921
killdeer Charadrius vociferus Idaho 56.41970
lazuli bunting Passerina amoena Idaho 85.12329
Lewis’s woodpecker Melanerpes lewis Idaho 71.66795
long-eared owl Asio otus Idaho 66.75941
MacGillivray’s warbler Geothlypis tolmiei Idaho 76.69511
mallard Anas platyrhynchos Idaho 87.65577
mountain chickadee Poecile gambeli Idaho 63.37515
northern flicker Colaptes auratus Idaho 86.56977
northern harrier Circus cyaneus Idaho 71.13981
northern pygmy-owl Glaucidium gnoma Idaho 69.37824
northern rough-winged swallow Stelgidopteryx serripennis Idaho 58.52182
northern saw-whet owl Aegolius acadicus Idaho 73.17100
olive-sided flycatcher Contopus cooperi Idaho 55.03477
pine siskin Spinus pinus Idaho 78.22397
prairie falcon Falco mexicanus Idaho 73.68943
red crossbill Loxia curvirostra Idaho 52.45629
red-breasted nuthatch Sitta canadensis Idaho 89.29945
red-tailed hawk Buteo jamaicensis Idaho 83.58497
red-winged blackbird Agelaius phoeniceus Idaho 65.22796
ring-necked duck Aythya collaris Idaho 87.38486
rock pigeon Columba livia Idaho 51.20198
sage thrasher Oreoscoptes montanus Idaho 53.83167
sharp-shinned hawk Accipiter striatus Idaho 53.95897
song sparrow Melospiza melodia Idaho 58.15657
spotted sandpiper Actitis macularius Idaho 76.10001
Swainson’s hawk Buteo swainsoni Idaho 95.49056
western grebe Aechmophorus occidentalis Idaho 67.58516
western kingbird Tyrannus verticalis Idaho 53.46191
western screech-owl Megascops kennicottii Idaho 83.04902
western tanager Piranga ludoviciana Idaho 98.95006
western wood-pewee Contopus sordidulus Idaho 67.01473
willow flycatcher Empidonax traillii Idaho 57.60302
yellow warbler Setophaga petechia Idaho 67.49039
# create clean data folder and save new data file
dir.create("02_Clean_data", showWarnings = FALSE)
write.csv(ID_enc_rate, file = "02_Clean_data/ID_enc_rate.csv",
    row.names = FALSE)

5 Analyze data in a code chunk reproducibly

Although dynamically linking source code to markdown documents is the main goal of this chapter, sometimes it makes sense to include short code chunks directly into your markdown document. There are a variety of customizable options on how the code and its output are displayed in the final knitted document. We use code chunk arguments from the knitr package to choose the parts of our code chunks that are outputted in the final html document.

5.1 Set up a chunk’s outputs correctly

First, we will learn how to set up a code chunk. We use code chunk options/arguments from the knitr package to choose the parts of our code chunks that are outputted in the final html document.

Code chunk options/arguments:

  • include = FALSE hides code and results from the output document (html in our case), but code will still run
  • eval = FALSE includes code from the document without running it
  • echo = FALSE hides code but not results from the document
  • results = 'hide' hides results but not code from the document
  • warning, message, error = FALSE hides warnings, messages, and error messages from the document
  • fig.show = 'hide' hides figures specifically from the final output document

For eval and echo, you can tell knitr which expressions in the chunk to include using numerical vectors. For example, eval = 1:2 will only include the 1st 2 expressions of code in the output document.

5.1.1 Thought Excercise:

Take 5 minutes to discuss with the person sitting next to you. Imagine you have an R markdown document with chunks of analysis. You are now sending just the html document outputted from it to your PI for review. What chunk arguments might you set? What if you are publishing the final version of that html document? Now share with the class.

5.2 Analysis in a code chunk

Code chunks are best used when the code is simple or short. To demonstrate analysis in a code chunk for this section, we are going to to do a little bit more data transformation to make a couple of objects of the Idaho encounter rate data grouped taxonomically for some descriptive statistics we will use later. By the end of this tutorial, we will create an analysis to compare the mean encounter rate of songbirds and waterfowl in the Idaho dataset.

# Pull passerines from ID encounter rates list
passerine_enc_rate <- ID_enc_rate %>%
    filter(common.name %in% c("American robin", "bank swallow",
        "black-billed magpie", "black-headed grosbeak", "Brewer's blackbird",
        "Bullock's oriole", "Cassin's finch", "Cassin's vireo",
        "cedar waxwing", "cliff swallow", "dark-eyed junco",
        "dusky flycatcher", "European starling", "evening grosbeak",
        "Hammond's flycatcher", "house finch", "lazuli bunting",
        "MacGillivray's warbler", "mountain chickadee", "northern rough-winged swallow",
        "olive-sided flycatcher", "pine siskin", "red crossbill",
        "red-winged blackbird", "sage thrasher", "song sparrow",
        "western kingbird", "western tanager", "western wood-pewee",
        "willow flycatcher", "yellow warbler"))
waterfowl_enc_rate <- ID_enc_rate %>%
    filter(common.name %in% c("American coot", "American wigeon",
        "cinnamon teal", "common goldeneye", "common merganser",
        "mallard", "ring-necked duck", "western grebe"))

6 Dynamically report variables in your text

Sometimes you may want to have R code or output appear inline with the rest of the markdown document. These can be static or dynamic.

  • Static: When you want to display a line of code in the markdown document.
    • Example: Here is a line of code from data prep for this lesson: ID_enc_rate <- state_enc_rate_clean_filtered %>% filter(encounter.state.normalized!=0 )
  • Dynamic: When you want to display output of a line of code inline with the rest of the markdown document.
    • Example: The average encounter rate for the most commonly encountered passerines in Idaho is 69.2773162.

7 Source or code analysis through multiple methods

Bogging down markdown documents with lots of long code chunks can impair readibility. This is where modular analysis comes in. Modular analysis is a way to dynamically included analysis in your .rmd by keeping script for an analysis in a separate file and then linking/calling it in the .rmd. Changing the script file will change the output in the .rmd, and that output can be used for further analysis in the .rmd. Other times where it might be better to link to a script that contains analysis code include: needing the same code for multiple different output documents, making it easier for other researchers to look for specific parts of your analysis. We will go over two ways to dynamically include analysis in your R Markdown document: locally sourced modular analysis and url-sourced modular analysis.

7.1 Locally sourced modular analysis

We can link to previously written code locally stored on your computer that you want to pull for an analysis. To do so, we will save that code in its own R file and use the source command from knitR to call it. This method is best utilized in the early stages of your research, as others won’t have access to your local files.

Here is a string of code that was used to produce a figure. This is also made from ebird data, but instead of starting with someone else’s caclulations, we are interacting with the package ebirdst, which pulls data Cornell’s eBird database.

# Note eval = FALSE. This code chunk won't actually run. We
# are going to use an r file with the same code for this
# exercise. load packages
library(ggplot2)
library(sf)
library(raster)
library(terra)
library(patchwork)
library(tigris)
library(rnaturalearth)
library(ebirdst)
# Allows to access data in ebirst package
set_ebirdst_access_key("aj6vlciqa1gd", overwrite = TRUE)
# Get the state boundary for Sonora, Mexico
region_boundary <- ne_states(iso_a2 = "MX", returnclass = "sf") |>
    filter(name == "Sonora")
# download data if they haven't already been downloaded
# only weekly 3km relative abundance, median and confidence
# limits
ebirdst_download_status("ameavo", pattern = "abundance_(median|upper|lower)_3km")
# load the median weekly relative abundance and lower/upper
# confidence limits
abd_median <- load_raster("ameavo", product = "abundance", metric = "median")
abd_lower <- load_raster("ameavo", product = "abundance", metric = "lower")
abd_upper <- load_raster("ameavo", product = "abundance", metric = "upper")
# project region boundary to match raster data
region_boundary_proj <- st_transform(region_boundary, st_crs(abd_median))
# extract values within region and calculate the mean
abd_median_region <- extract(abd_median, region_boundary_proj,
    fun = "mean", na.rm = TRUE, ID = FALSE)
abd_lower_region <- extract(abd_lower, region_boundary_proj,
    fun = "mean", na.rm = TRUE, ID = FALSE)
abd_upper_region <- extract(abd_upper, region_boundary_proj,
    fun = "mean", na.rm = TRUE, ID = FALSE)
# transform to data frame format with rows corresponding to
# weeks
chronology_SO <- data.frame(week = as.Date(names(abd_median)),
    median = as.numeric(abd_median_region), lower = as.numeric(abd_lower_region),
    upper = as.numeric(abd_upper_region))
# Make a graph
AMAV_SO <- ggplot(chronology_SO) + aes(x = week, y = median) +
    geom_ribbon(aes(ymin = lower, ymax = upper), fill = "green",
        alpha = 0.4) + geom_line(color = "purple") + scale_x_date(date_labels = "%b",
    date_breaks = "1 month") + theme_classic() + labs(x = "Week",
    y = "Mean relative abundance in Sonora", title = "Migration chronology for American Avocet in Sonora")
AMAV_SO

7.2 Exercise

We have put the code from the code chunk above into a separate r file in the Ch. 9 folder. Now we are going to call it with the source function. Try running the following code chunk:

# Run main analysis
source("03_linked_scripts/Sonora_AMAV_migration_chron.R")
## eBird Status and Trends access key stored in: C:/Users/rosey/.Renviron
## The rnaturalearthhires package needs to be installed.
## Installing the rnaturalearthhires package.
## WARNING: Rtools is required to build R packages, but is not currently installed.
## 
## Please download and install Rtools 4.5 from https://cran.r-project.org/bin/windows/Rtools/.
## 
## Using GitHub PAT from the git credential store.
## 
## Downloading GitHub repo ropensci/rnaturalearthhires@HEAD
## ── R CMD build ─────────────────────────────────────────────────────────────────
## WARNING: Rtools is required to build R packages, but is not currently installed.
## 
## Please download and install Rtools 4.5 from https://cran.r-project.org/bin/windows/Rtools/.
##          checking for file 'C:\Users\rosey\AppData\Local\Temp\RtmpS2BsBb\remotesa96c2623458c\ropensci-rnaturalearthhires-e4736f6/DESCRIPTION' ...  ✔  checking for file 'C:\Users\rosey\AppData\Local\Temp\RtmpS2BsBb\remotesa96c2623458c\ropensci-rnaturalearthhires-e4736f6/DESCRIPTION' (348ms)
##       ─  preparing 'rnaturalearthhires': (427ms)
##    checking DESCRIPTION meta-information ...  ✔  checking DESCRIPTION meta-information
##       ─  checking for LF line-endings in source and make files and shell scripts
##   ─  checking for empty or unneeded directories
##       ─  building 'rnaturalearthhires_1.0.0.9000.tar.gz'
##      
## 
## Installing package into 'C:/Users/rosey/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## Downloading Status Data Products for ameavo
## Data already exists, use force = TRUE to re-download.
AMAV_SO

7.3 Url-sourced modular analysis

Once you are further along in your research, you may want to make sure your file in publicly accessible to ensure reproducibility. You can host your file in a GitHub repository and use the source_url command from the devtools package to call it.

We have set up a pre-made GitHub file for this section. It contains the following code:

region_boundary <- ne_states(iso_a2 = "US") |>
    filter(name == "Idaho")
# download data if they haven't already been downloaded
# only weekly 3km relative abundance, median and confidence
# limits
ebirdst_download_status("ameavo", pattern = "abundance_(median|upper|lower)_3km")
# load the median weekly relative abundance and lower/upper
# confidence limits
abd_median <- load_raster("ameavo", product = "abundance", metric = "median")
abd_lower <- load_raster("ameavo", product = "abundance", metric = "lower")
abd_upper <- load_raster("ameavo", product = "abundance", metric = "upper")
# project region boundary to match raster data
region_boundary_proj <- st_transform(region_boundary, st_crs(abd_median))
# extract values within region and calculate the mean
abd_median_region <- extract(abd_median, region_boundary_proj,
    fun = "mean", na.rm = TRUE, ID = FALSE)
abd_lower_region <- extract(abd_lower, region_boundary_proj,
    fun = "mean", na.rm = TRUE, ID = FALSE)
abd_upper_region <- extract(abd_upper, region_boundary_proj,
    fun = "mean", na.rm = TRUE, ID = FALSE)
# transform to data frame format with rows corresponding to
# weeks
chronology_ID <- data.frame(week = as.Date(names(abd_median)),
    median = as.numeric(abd_median_region), lower = as.numeric(abd_lower_region),
    upper = as.numeric(abd_upper_region))
AMAV_ID <- ggplot(chronology_ID) + aes(x = week, y = median) +
    geom_ribbon(aes(ymin = lower, ymax = upper), fill = "green",
        alpha = 0.4) + geom_line(color = "purple") + scale_x_date(date_labels = "%b",
    date_breaks = "1 month") + theme_classic() + labs(x = "Week",
    y = "Mean relative abundance in Washington", title = "Migration chronology for American Avocet in Idaho")
AMAV_ID

7.4 Exercise

Use the source_url command from the devtools package to call the file. Run the following code chunk.

source_url("https://raw.githubusercontent.com/rosean829/Rep-Sci-Chapter-9/main/03_linked_scripts/Idaho_AMAV_migration_chronology.R")
## ℹ SHA-1 hash of file is "dd74275f1b0a1301afdd9da8cf3db53c2fca5761"
## Downloading Status Data Products for ameavo
## 
## Data already exists, use force = TRUE to re-download.
AMAV_ID

8 Ensure results are reproducibly random

When an analysis produces simulations, a random number generator is generally used so that others can replicate your results. For this tutorial, we will do an exercise to learn the importance of using the set.seed command to ensure replicability.

8.0.1 Exercise:

You and the person next to you use the same distribution variables but diff vs. same set.seed. Do you get the same result? What about when you use the same set.seed?

Have the person sitting on the left run the following simple regression with set.seed(125). The person on the right will run the next code chunk. Compare your results.

## The person on the left of each pair will set seed as 125
set.seed(125)
# Draw 1000 numbers
Draw1 <- rnorm(1000, mean = 0, sd = 2)  # Summarize Draw1
# Draw 1000 other numbers
Draw2 <- rnorm(1000, mean = 5, sd = 1.5)
# Plot Draw1 and Draw2
plot(Draw2)

plot(Draw1)

# Create a regression with Draw1 and Draw2, then plot it.
model <- lm(Draw1 ~ Draw2)
plot(model)

## Save the 1st regression in its own Rdata file (this will
## be relevant in a later section)
save(model, file = "model.RData")

The person sitting on the right will run the following code chunk (same code as the chunk above but with set.seed(150)). After comparing your results with your partner, run the set.seed(125) chunk. Compare your results.

## The person on the right of each pair will set seed as
## 150 Set seed as 150
set.seed(150)
# Draw 1000 numbers
Draw3 <- rnorm(1000, mean = 0, sd = 2)  # Summarize Draw1
# Draw 1000 other numbers
Draw4 <- rnorm(1000, mean = 5, sd = 1.5)
# Plot both Draw3 and Draw4
plot(Draw3)

plot(Draw4)

# Create a regression from Draw3 and Draw4, then plot it.
model1 <- lm(Draw3 ~ Draw4)
plot(model1)

9 Run computationally intensive analysis efficiently

When an analysis is too computationally intensive, knowing how to use the cache functions from knitR is helpful. To cache a code chunk, use the argument cache = TRUE. A cached chunk will only run if the chunk’s contents or options change.

However, subsequent chunks will not be able to access objects created by the cached chunks or any packages loaded in it (all packages should be loaded in their own chunk anyways though), and the cached chunk will not run if a previous chunk with code it depended on changes. You can solve this last problem with the argument dependson. A chunk with the dependson argument will rerun if the specified chunk is also rerun. To specify a chunk, use a vector of their labels or their number order from the start of the document. For example, dependson = c(2,3) will rerun the cached chunk if the 2nd and 3rd chunks of the document are also rerun.

9.0.1 Excercise

The 2 code chunks from earlier have the argument cache = TRUE. If you haven’t knit the document yet, knit it now. If you have, do you remember the load time for the knit? Now that you’ve knit once, knit the document again. What happens to the knit time?

9.0.2 Excercise

You can also save objects created by a cached chunk to a separate .RData file and load that in subsequent chunks with the load command if you want to keep using the outputs of the cached chunk even when the chunk doesn’t run. The last line of code in one of the previous chunks (save(model, file = "model.RData")) saved its output in an .RData file. In the next chunk, we will call and then plot the output without running that chunk:

# Load Sample
load(file = "model.RData")
summary(model)
## 
## Call:
## lm(formula = Draw1 ~ Draw2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.0946 -1.2920  0.0146  1.4463  5.8039 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.22228    0.22736  -0.978    0.328
## Draw2        0.01989    0.04307   0.462    0.644
## 
## Residual standard error: 2.016 on 998 degrees of freedom
## Multiple R-squared:  0.0002136,  Adjusted R-squared:  -0.0007882 
## F-statistic: 0.2132 on 1 and 998 DF,  p-value: 0.6444
plot(model)

10 References

Schuetz, J.G., and A. Johnston. 2019. Characterizing the cultural niches of North American birds. Proceedings of the National Academy of Sciences 116: 10868–10873. Available at: https://pnas.org/doi/full/10.1073/pnas.1820670116 [Accessed October 26, 2025].

Appendices

A Appendix 1

Citations of all R packages used to generate this report. Reads and prints citations stored in packages.bib

### Load R package
library("knitcitations")
### Process and print citations in packages.bib Clear all
### bibliography that could be in the cash
cleanbib()
# Set pandoc as the default output option for bib
options(citation_format = "pandoc")
# Read and print bib from file
read.bibtex(file = "packages.bib")

[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.30. 2025. https://github.com/rstudio/rmarkdown.

[2] S. M. Bache and H. Wickham. magrittr: A Forward-Pipe Operator for R. R package version 2.0.4. 2025. https://magrittr.tidyverse.org.

[3] R. S. Bivand, E. Pebesma, and V. Gomez-Rubio. Applied spatial data analysis with R, Second edition. Springer, NY, 2013. https://asdar-book.org/.

[4] C. Boettiger. knitcitations: Citations for Knitr Markdown Files. R package version 1.0.12. 2021. https://github.com/cboettig/knitcitations.

[5] J. Cheng, C. Sievert, B. Schloerke, et al. htmltools: Tools for HTML. R package version 0.5.8.1. 2024. https://github.com/rstudio/htmltools.

[6] R. Francois and D. Hernangómez. bibtex: Bibtex Parser. R package version 0.5.1. 2023. https://github.com/ropensci/bibtex.

[7] G. Grolemund and H. Wickham. “Dates and Times Made Easy with lubridate”. In: Journal of Statistical Software 40.3 (2011), pp. 1-25. https://www.jstatsoft.org/v40/i03/.

[8] R. J. Hijmans. raster: Geographic Data Analysis and Modeling. R package version 3.6-32. 2025. https://rspatial.org/raster.

[9] R. J. Hijmans. terra: Spatial Data Analysis. R package version 1.8-70. 2025. https://rspatial.org/.

[10] P. Massicotte and A. South. rnaturalearth: World Map Data from Natural Earth. R package version 1.1.0. 2025. https://docs.ropensci.org/rnaturalearth/.

[11] K. Müller and H. Wickham. tibble: Simple Data Frames. R package version 3.3.0. 2025. https://tibble.tidyverse.org/.

[12] E. Pebesma. “Simple Features for R: Standardized Support for Spatial Vector Data”. In: The R Journal 10.1 (2018), pp. 439-446. DOI: 10.32614/RJ-2018-009. https://doi.org/10.32614/RJ-2018-009.

[13] E. Pebesma. sf: Simple Features for R. R package version 1.0-21. 2025. https://r-spatial.github.io/sf/.

[14] E. J. Pebesma and R. Bivand. “Classes and methods for spatial data in R”. In: R News 5.2 (Nov. 2005), pp. 9-13. https://CRAN.R-project.org/doc/Rnews/.

[15] E. Pebesma and R. Bivand. Spatial Data Science: With applications in R. Chapman and Hall/CRC, 2023. DOI: 10.1201/9780429459016. https://r-spatial.org/book/.

[16] E. Pebesma and R. Bivand. sp: Classes and Methods for Spatial Data. R package version 2.2-0. 2025. https://github.com/edzer/sp/.

[17] T. L. Pedersen. patchwork: The Composer of Plots. R package version 1.3.2. 2025. https://patchwork.data-imaginist.com.

[18] Y. Qiu. prettydoc: Creating Pretty Documents from R Markdown. R package version 0.4.1. 2021. https://github.com/yixuan/prettydoc.

[19] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2025. https://www.R-project.org/.

[20] K. Ren and K. Russell. formattable: Create Formattable Data Structures. R package version 0.2.1. 2021. https://renkun-ken.github.io/formattable/.

[21] V. Spinu, G. Grolemund, and H. Wickham. lubridate: Make Dealing with Dates a Little Easier. R package version 1.9.4. 2024. https://lubridate.tidyverse.org.

[22] M. Strimas-Mackey, S. Ligocki, T. Auer, et al. ebirdst: Access and Analyze eBird Status and Trends Data Products. R package version 3.2023.1. 2025. https://ebird.github.io/ebirdst/.

[23] K. Walker. tigris: Load Census TIGER/Line Shapefiles. R package version 2.2.1. 2025. https://github.com/walkerke/tigris.

[24] H. Wickham. forcats: Tools for Working with Categorical Variables (Factors). R package version 1.0.1. 2025. https://forcats.tidyverse.org/.

[25] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN: 978-3-319-24277-4. https://ggplot2.tidyverse.org.

[26] H. Wickham. stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.5.2. 2025. https://stringr.tidyverse.org.

[27] H. Wickham. tidyverse: Easily Install and Load the Tidyverse. R package version 2.0.0. 2023. https://tidyverse.tidyverse.org.

[28] H. Wickham, M. Averick, J. Bryan, et al. “Welcome to the tidyverse”. In: Journal of Open Source Software 4.43 (2019), p. 1686. DOI: 10.21105/joss.01686.

[29] H. Wickham, J. Bryan, M. Barrett, et al. usethis: Automate Package and Project Setup. R package version 3.2.1. 2025. https://usethis.r-lib.org.

[30] H. Wickham, W. Chang, L. Henry, et al. ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. R package version 4.0.0. 2025. https://ggplot2.tidyverse.org.

[31] H. Wickham, R. François, L. Henry, et al. dplyr: A Grammar of Data Manipulation. R package version 1.1.4. 2023. https://dplyr.tidyverse.org.

[32] H. Wickham and L. Henry. purrr: Functional Programming Tools. R package version 1.1.0. 2025. https://purrr.tidyverse.org/.

[33] H. Wickham, J. Hester, and J. Bryan. readr: Read Rectangular Text Data. R package version 2.1.5. 2024. https://readr.tidyverse.org.

[34] H. Wickham, J. Hester, W. Chang, et al. devtools: Tools to Make Developing R Packages Easier. R package version 2.4.6. 2025. https://devtools.r-lib.org/.

[35] H. Wickham, D. Vaughan, and M. Girlich. tidyr: Tidy Messy Data. R package version 1.3.1. 2024. https://tidyr.tidyverse.org.

[36] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman and Hall/CRC, 2016. ISBN: 978-1138700109. https://bookdown.org/yihui/bookdown.

[37] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.45. 2025. https://github.com/rstudio/bookdown.

[38] Y. Xie. Dynamic Documents with R and knitr. 2nd. ISBN 978-1498716963. Boca Raton, Florida: Chapman and Hall/CRC, 2015. https://yihui.org/knitr/.

[39] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014.

[40] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.50. 2025. https://yihui.org/knitr/.

[41] Y. Xie, J. Allaire, and G. Grolemund. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman and Hall/CRC, 2018. ISBN: 9781138359338. https://bookdown.org/yihui/rmarkdown.

[42] Y. Xie, C. Dervieux, and E. Riederer. R Markdown Cookbook. Boca Raton, Florida: Chapman and Hall/CRC, 2020. ISBN: 9780367563837. https://bookdown.org/yihui/rmarkdown-cookbook.

[43] H. Zhu. kableExtra: Construct Complex Table with kable and Pipe Syntax. R package version 1.4.0. 2024. http://haozhu233.github.io/kableExtra/.

B Appendix 2

Version information about R, the operating system (OS) and attached or R loaded packages. This appendix was generated using sessionInfo().

# Load and provide all packages and versions
sessionInfo()
## R version 4.5.1 (2025-06-13 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Denver
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] rnaturalearthhires_1.0.0.9000 rnaturalearth_1.1.0          
##  [3] tigris_2.2.1                  patchwork_1.3.2              
##  [5] terra_1.8-70                  raster_3.6-32                
##  [7] sp_2.2-0                      sf_1.0-21                    
##  [9] ebirdst_3.2023.1              lubridate_1.9.4              
## [11] forcats_1.0.1                 stringr_1.5.2                
## [13] purrr_1.1.0                   readr_2.1.5                  
## [15] tidyr_1.3.1                   tibble_3.3.0                 
## [17] ggplot2_4.0.0                 tidyverse_2.0.0              
## [19] devtools_2.4.6                usethis_3.2.1                
## [21] bibtex_0.5.1                  knitcitations_1.0.12         
## [23] htmltools_0.5.8.1             prettydoc_0.4.1              
## [25] magrittr_2.0.4                dplyr_1.1.4                  
## [27] kableExtra_1.4.0              formattable_0.2.1            
## [29] bookdown_0.45                 rmarkdown_2.30               
## [31] knitr_1.50                   
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1   viridisLite_0.4.2  farver_2.1.2       S7_0.2.0          
##  [5] fastmap_1.2.0      digest_0.6.37      timechange_0.3.0   lifecycle_1.0.4   
##  [9] ellipsis_0.3.2     compiler_4.5.1     rlang_1.1.6        sass_0.4.10       
## [13] tools_4.5.1        yaml_2.3.10        htmlwidgets_1.6.4  pkgbuild_1.4.8    
## [17] classInt_0.4-11    plyr_1.8.9         xml2_1.4.1         RColorBrewer_1.1-3
## [21] pkgload_1.4.1      KernSmooth_2.23-26 withr_3.0.2        grid_4.5.1        
## [25] e1071_1.7-16       scales_1.4.0       cli_3.6.5          generics_0.1.4    
## [29] remotes_2.5.0      rstudioapi_0.17.1  httr_1.4.7         tzdb_0.5.0        
## [33] sessioninfo_1.2.3  DBI_1.2.3          cachem_1.1.0       proxy_0.4-27      
## [37] formatR_1.14       vctrs_0.6.5        jsonlite_2.0.0     hms_1.1.4         
## [41] systemfonts_1.3.1  jquerylib_0.1.4    units_1.0-0        glue_1.8.0        
## [45] RefManageR_1.4.0   codetools_0.2-20   stringi_1.8.7      gtable_0.3.6      
## [49] pillar_1.11.1      rappdirs_0.3.3     R6_2.6.1           textshaping_1.0.4 
## [53] evaluate_1.0.5     lattice_0.22-7     backports_1.5.0    memoise_2.0.1     
## [57] bslib_0.9.0        class_7.3-23       Rcpp_1.1.0         uuid_1.2-1        
## [61] svglite_2.2.2      xfun_0.53          fs_1.6.6           pkgconfig_2.0.3